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Abstract 

We study the problem of computing a minimum equivalent digraph (also known as the 
problem of computing a strong transitive reduction) and its maximum objective function variant, 
with two types of extensions. First, we allow to declare a set D C E and require that a valid 
solution A satisfies D C A (it is sometimes called transitive reduction problem). In the second 
extension (called p-ary transitive reduction), we have integer edge labeling and we view two 
paths as equivalent if they have the same beginning, ending and the sum of labels modulo p. A 
solution A C E is valid if it gives an equivalent path for every original path. For all problems 
we establish the following: polynomial time minimization of |A| within ratio 1.5, maximization 
of |E — A| within ratio 2, MAX-SNP hardness even of the length of simple cycles is limited to 5. 
Furthermore, wc believe that the combinatorial technique behind the approximation algorithm 
for the minimization version might be of interest to other graph connectivity problems as well. 
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1 Introduction 



1.1 Definitions and motivation 

Minimum equivalent digraph is a classic computational problem (cf. [13J) with several recent 
extensions motivated by applications in social sciences, systems biology etc. 

The statement of the equivalent digraph problem is simple. For a digraph (V, E) the transitive 
closure of E is relation "E contains a path from u to v". In turn, A is an equivalent digraph for E 
if (a) ACE, (b) transitive closures of A and E are the same. 

The assumption that the valid solutions are the equivalent digraphs of E yields two different 
optimization problems when we define two objective functions: MlN-ED, in which we minimize 
|A|, and Max-ED, in which we maximize |E — A|. where A is an equivalent digraph for E. 

Skipping condition (a) yields transitive reduction problem which is optimally solved by Aho et 
al. pQ. This could motivate renaming the equivalent digraph as a strong transitive reduction [14j . 

In the study of biological systems networks of interactions are considered, e.g. nodes can be 
genes and an edge (u,v) means that gene u regulates gene v. Without going into biological details, 
regulations may mean at least two different things: when u is expressed, i.e. molecules of the 
protein coded by u are created, the expression of v can be repressed or promoted. A path in this 
network is an indirect interaction, and promoting a repressor represses, while repressing a repressor 
promotes (biologists also used the term de-repression). Interactions of such two types can appear 
in other contexts as well, including social networks. This motivates an extension of the notion of 
digraph and its transitive closure described in points ®-(D below. 

Moreover, for certain interactions we have direct evidence, so an instance description includes 
set D C E of edges which have to be present in every valid solution. Formally, we define A to be a 
valid solution to instance (V,E,£,D) as follows: 

® I : E — > 7L V \ 

® a path P = (uo, ui , . . . , u^) has characteristic £(P) = ")^x=\ ^( u i-i > u i) mod p; 
© Closure^E) = {(u,v, q) : 3P in E from u to v and £(P) = q}; 

© A is a p-ary transitive reduction of E with a required subset D if D C A C E and Closure^ (A) = 
Closure^E). 

Our two objective functions define optimization problems Min-TR p and Max-TR p . 

1.2 Earlier results 

The initial work on Min-ED by Moyles and Thomson [13] described an efficient reduction to the 
case of strongly connected graphs and an exact exponential time algorithm for the latter. 

Several approximation algorithms for Min-ED were described, by Khuller et al. [10], with 
approximation ratio 1 .617 + e and by Vetta [15] with approximation ratio 1.5. The latter result did 
not have a full peer review, however. 

If edges have costs, we can minimize c(A) within factor 2 using an algorithm for minimum cost 
rooted arborescence [H [8] of Edmonds (who described it) and Karp (who simplified it). We find 
minimum cost in- and out- arborescence in respect to an arbitrary root r 6 V. 

Albert et al. [2] showed how to convert an algorithm for MlN-ED with approximation ratio 
r to an algorithm for MiN-TRi with approximation ratio 3 — 2/r. They have also shown a 2 + 
o(1 ^approximation for Min-TR p when p is a prime. Other heuristics for these problems were 
investigated in [31 [7] . 

On the hardness side, Papadimitriou [14] formulated an exercise to show that strong transitive 
reduction is NP-hard, Khuller et al. have proven it formally and they also showed MAX-SNP hard- 
ness. Motivated by their cycle contraction method in [10], they were interested in the complexity of 
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the problem when there is an upper bound y on the cycle length; in [9] they showed that Min-ED 
is polynomial with y = 3, NP-hard with y = 5 and MAX-SNP-hard with y = 17. 

1.3 Results in this paper 

We show an approximation algorithm for Min-ED with ratio 1.5, We use a method somewhat 
similar to that of Vetta [15], but our combinatorial lower bound makes a more explicit use of 
the primal-dual formulation of Edmonds and Karp, and this makes it much easier to justify edge 
selections within the promised approximation ratioo. 

Next, we show how to modify that algorithm to approximate MiN-TRi within ratio 1.5. One 
surely cannot use a method for Min-ED as a "black box" because we need to control which edges 
we keep and which we delete. 

We show approximation algorithm with ratio 2 for Max-TRi . While it was shown by Albert 
et al. [3] that a simple greedy algorithm, delete an unnecessary edge as long as one exists, yields 
ratio 3 approximation, it is easy to provide an example of Max-ED instance with n nodes and 
2n — 2 edges in which greedy removes only one edge, and the optimum solution removes u — 2 
edges. Other known algorithms for Min-ED are not much better in the worst case when applied 
to Max-ED. 

We show that for prime p we can transform an equivalent digraph that contains the required 
edges into a p-ary transitive reduction by a single edge insertion per strongly connected component. 
Because every p-ary transitive reduction is also an equivalent digraph, this transformation implies 
approximation algorithms for MiN-TR p with ratio 1 .5 and for Max-TR„ with ratio 2 (we can 
compensate for the insertion of a single edge, so the ratio does not change) q 

We simplify the MAX-SNP hardness proof for Min-ED (the proof applies to Max-ED as well) 
so it applies even when y, the maximum cycle length, is 5. This leaves open only the case of y = 4. 

1.4 Some Motivations and Applications 

Application of Min-ED: Connectivity Requirements in Computer Networks. Khuller 
et al. [9] mentioned applications of Min-ED to design of computer networks that satisfy given 
connectivity requirements. 

If a set of connections exists already, then this application motivates Min-TRi (cf. |llj). 

Application of Min-TRi: Social Network Analysis and Visualization. Min-TRi can be 
applied to social network analysis and visualization. For example, Dubois and Cecile [S] applies 
Min-TRi to the social network built upon interaction data (email boxes) of Enron corporation to 
study general properties (such as scale-freeness) of such networks and to help in the visualization 
process. They use a straightforward greedy approach which, as we have discussed, has inferior 
performance, both for MiN-TRi and Max-TR]. 

Application of M1N-TR2: Inferring Biological Signal Transduction Networks. In sub- 
section [1J] we motivated M1N-TR2 with the study of gene regulatory networks. The same issues 
apply to other cellular interactions, like signal transduction networks and they were addressed by 
Albert et al. [3], with the use of an approximation algorithm M1N-TR2. 

1 It appears that the approach of [15] may be correct, but the proofs and the description of the algorithm seem 
to have some gaps. It is somewhat difficult to point out these gaps without going into technical details; we will point 
out some problems as an illustration in the proof section of the paper. 

2 Albert et al. [2] did not use this approach and tried to approximate Min-TR p for p > 1 directly thus obtaining 
a 2 + o(l )-approximation for prime p. 
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1.5 Our Techniques 

Approximation algorithm for the MlN objective. Vetta used a primal/dual LP formulation 
for Min-ED, more precisely, a solution that satisfies a subset of linear constraints, and the optimum 
solution for that subset is integer and it can be found using a maximum matching. We observed 
that a larger set of constraints also has this property, and that the extra edges for justified by this 
extension make it much easier to analyze the algorithm. 

To tackle MiN-TRi problem we had to justify yet more edges, as the algorithm is not allowed 
to delete the required edges. We showed that we can use a yet larger set of constraints, with a 
"good enough" solution that can be found using a maximum weight matching. 

We also used depth first search in a manner inspired by Tarjan's algorithm for finding strongly 
connected components. 

We also show an inherent limitation of our approach by showing an integrality gap of the LP 
relaxation of the above IP to be of at least 3. 

Approximation algorithm for the Max objective. For the Max-TRi, we utilize the inte- 
grality of the polytope of the rooted arborescence problem to provide a 2-approximation. We also 
observe that the integrality gap of the LP relaxation of the IP formulation is at least 3/2. 
The p-ary case for prime p. We show that we can solve an instance of Min-TR p /Max-TR p 
by solving a related instance of Min-TRi /Max-TRi and inserting a appropriately chooses single 
edge. In conjunction with our above results, this leads to a 1 .5-approximation for Min-TR p and 
2-approximation for Max-TR p . This method works only if p is prime. 

Inapproximability. We adapt the reduction used by Khuller et al. [10] , but we apply it to a very 
restricted (and yet, MAX-SNP hard) version of Max-SAT (cf. [1]). 

1.6 Notation 

We use the following additional notations. 

• G = (V, E) is the input digraph; 

• i(U) ={(u,v) e E : ug'U&veU}, l(u 1( . .. ,u k ) = i({u-|,. . . ,u k }); ; 

• o(U) ={(u,v] GE: ueU&vg'U}, o(ui, . . . ,u k ) = o({u!, . . . ,u k }); 

• scca(u) is the strongly connected component containing vertex u in the digraph (V, A); 

• T[u] is a the node set of the subtree with root u (of a rooted tree T). 

2 A Primal-Dual Linear Programming Relaxation of TRi 

Moyles and Thompson [13] showed that Min-ED can be reduced in linear time to the case when 
the input graph (V, E) is strongly connected, therefore we will assume that (V, E) is already strongly 
connected. In Section [3.21 we will show the same for Min-TR p . 

The minimum cost rooted arborescence problem on G is defined as follows. We are given a 
weighted digraph (V, E), a cost function c : E — ) K+ and root node r £ V. A valid solution is A C E 
such that in (V, A) there is a path from r to every other node and we need to minimize c(A). An 
LP formulation for this was provided by Edmonds and Karp and goes as follows. As in any other 
edge/arc selection problems, we use the linear space with a coordinate for each arc, so edge sets can 
be identified with 0-1 vectors, and for an arc e variable x e describes whether we select e (x e = 1) 
or not (x e = 0). Then, the LP formulation is: 

(primal PI) 

minimize c • x subject to 
x > 

i(U) • x > 1 for all U s.t. C U C V and r £ U (2.1) 
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Edmonds [6] and Karp [8] showed that the above LP always has an integral optimal solution and 
that we can find it in polynomial-time. 

We can modify the above LP formulation to a LP formulation for Min-ED provided we set 
c(e) = 1 and in (2.1) we remove "and r U" from the condition. The dual program of this LP 
can be constructed by having a vector y that has a coordinate yu for every C 11 C V; both the 
primal and the dual is written down below for clarity: 



We can change P2 into the LP formulation for Max-ED by replacing the objective to "maximize 
1 • (1 — x)". and the dual is changed accordingly to reflect this change. 

From now on, by a requirement we mean a set of edges R such that a valid solution must intersect 
it; in LP formulation it means that we have a constraint Rx > 1 . 

We can extend P2 to an LP formulation for TR] by adding a one-edge requirement {e} (i.e. 
inequality x e > 1) for each required edge e. 

We can obtain a lower bound for solutions of P2 by solving P3, an IP obtained from P2 by 
allowing only those requirements Rx > 1 that for some node u satisfy R C i(u) or R C o(u). To 
find requirements of P3 efficiently, we first find strongly connected components of V — {u}. Then, 

(a) for each source component C we have requirement l(C) C o(u); 

(b) for each sink component C we have requirement o(C) C i(u); 

(c) if we have requirements R C R' we remove R'. 

After (c) one-edge requirements are disjoint with other requirements, hence multi-edge requirements 
form a bipartite graph (in which connections have the form of shared edges). 

3 Minimization algorithms 

3.1 1 .5-approximation for Min-ED 
Using DFS 

One can find an equivalent digraph using depth first search starting at any root node r. Because we 
operate in a strongly connected graph, only one root call of the depth first search is required. This 
algorithm mimics Tarjan's algorithm for finding strongly connected components and biconnected 
components. As usual for depth first search, the algorithm forms a spanning tree T in which we 
have an edge (u,v) if and only if Dfs(u) made a call Dfs(v). The invariant is 

(A) if Dfs(u) made a call Dfs(v) and Dfs(v) terminated then T[v] C sccjub 

(A) implies that (V, TUB) is strongly connected when Dfs(t) terminates. Moreover, in any depth 
first search the arguments of calls that already have started and have not terminated yet form a 
simple path starting at the root. By (A), every node already visited is, in (V, T U B), strongly 
connected to an ancestor who has not terminated. Thus, (A) implies that the strongly connected 
components of (VJUB) form a simple path. This justifies our convention of using the term back 
edge for all non-tree edges. 

To prove the invariant, we first observe that when Dfs(u) terminates then LowCanDo[u] is 
the lowest number of an end of an edge that starts in T[u]. 



(primal P2) 

minimize 1 • x subject to 



(dual D2) 

maximize 1 • y subject to 



x > 

i(U) • x > 1 for all U s.t. C U C V 



y > 

L eeL (U)yu < 1 for every e G E 
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Dfs(u) 

{ Counter <— Counter+1 

Number[u] <— LowDone[u] <— LowCanDo[u] ^-Counter 
for each edge (u,v) / / scan the adjacency list ofu 
if Number[v] = 

Insert(T, (u,v)) // (u,v) is a tree edge 
Dfs(v) 

if LowDone[u] > LowDone[v] 

LowDone[u] <— LowDone[v] 
if LowCanDo[u] > LowCanDo[v] 
LowCanDo[u] <— LowCanDo[v] 
LowEdge[u] <— LowEdge[v] 
else if LowCanDo[u] > Number[v] 
LowCanDo[u] <— Number[v] 
LowEdge[u] <— (u,v) 
/ / the final check: do we need another back edge ? 
if LowDone[u] = Number[u] and u/t 

Insert(B,LowEdge[u]) // LowEdge[u] is a back edge 
LowDone[u] <— LowCanDo[u] 

} 

T <- B <- 
for every node u 

Number[u] <— 
Counter <— 
Dfs(t) 

Figure 1: Dfs for finding an equivalent digraph of a strongly connected graph 



Application of (A) to each child of v shows that T[v] C sccjub( v ) when we perform the final 
check of Dfs(v). 

If the condition of the final check is false, we already have a B edge from T[v] to an ancestor of 
u, and thus we have a path from v to u in T U B. Otherwise, we attempt to insert such an edge. 
If LowCanDo[v] is "not good enough" then there is no path from T[v] to u, a contradiction with 
the assumption that the graph is strongly connected. 

The actual algorithm is based on the above Dfs, but we also need to alter the set of selected 
edges in some cases. 

Objects, credits, debits 

The initial solution L to the system P3 is divided into objects, strongly connected components of 
(V, L). L-edges are either inside objects, or between objects. We allocate L-edges to objects, and 
give 1 .5 € for each. In turn, an object has to pay for solution edges that connect it, for a T-edge that 
enters this object and for a B-edge that connects it to an ancestor. Each solution edge costs 1 € . 
Some object have enough money to pay for all L-edges inside, so they become strongly connected, 
and two more edges of the solution, to enter and to exit. We call them rich. Other objects are poor 
and we have to handle them somehow. 

When we discuss a small object A, we call it a path node, a digon or a triangles when |A| = 1 , 2, 3 
respectively. 

Allocation of L-edges to objects 

• L-edge inside object A: allocate to A; 
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Tthis should ' 
be the primary 
L-edge of t 
(t,u) is secondary 
and makes u rich 



D > 





Case 1.1 



Case 1.2 



Figure 2: Illustrations for the cases of path nodes and digons. 



• from object A: call the first L-edge primary, and the rest secondary; 

— primary L-edge A — > B, A = 1 : 1 .5 € to A; 

— primary L-edge A — > B, A > 1: 1 € to A, and 0.5 € to B; 

— secondary L-edge A — > B: 0.5 € to B (1 € to be allocated in the analysis of MlN-TRi). 
Later we will formulate Rule ~k to assure a desired property of primary edges. 

When is an object A rich? 

1. A is the root object, no payment for incoming and returning edges; 

2. |A| > 4: it needs at most L-edges inside, plus two edges, and it has at least 0.5|A|€ for these 
two edges; 

3. if |A| > 1 and an L-edge exits A: it needs at most L-edges inside, plus two edges, and it has 
at least (1 + 0.5|A|) € for these two edges; 

4. if |A =1,3 and a secondary L-edge enters A; 

5. if |A| = 1,3 and a primary L-edge enters A from some D where |D| > 1 . 
Guiding Dfs 

For a rich object A, use L-edges inside A in our solution, and we consider it in Dfs as a single 
node, with combined adjacency list. This makes point (1) below moot. Otherwise, the preferences 
are in the order: (1) L-edges inside the same object; (2) primary L-edges; (3) other edges. 

Analyzing the balance of poor objects 

A poor object A has parent object C; Dfs enters A from C to node u € A. 

We say that A shares (the cost of a B-edge) if either a B-edge to an ancestor of C is introduced 
by Dfs within a proper descendant D of A (A and D share the cost) or Dfs from an element of A 
introduces a B-edge to a proper ancestor of C (A and C share the cost). Path nodes and triangles 
that share have needs reduced to 1 .5 € and 4.5 € respectively, hence they achieve balance. 
Case 1: |A| = 1, A = {u}, A does not share. 

Because we have requirements contained in i(u) and in o(u), there exists L-edges that enter 
and exit u. If an L-edge entering u is secondary or exits a multi-node object, A is rich. Hence we 
assume a primary edge (t,u) from object {t}. 

Case 1.1: C = {t}. Because A does not share, no edge to an ancestor of t is present in B when the 
final check of Dfs(u) is performed, and thus T[u] is already strongly connected. Dfs(u) inserts 
LowEdge[u]. Would this edge go to a proper ancestor of t, A would share (with {t}). Thus 
LowEdge[u] goes to {t} (see Fig. EJ). Then o(T[u]) C i(u), hence o(T[u] U{t}) C o(u), hence there 
must be an L-edge from t to a node different than u, a secondary edge from t. 

However, we can eliminate this situation by a rule of selecting the primary edges. 
Rule it: When Dfs visits a path node object {u}, it selects a primary edge, an L-edge (u,v) such 
that Dfs(v) is the first recursive call o/Dfs(u/, in such a way that in V — {u} a proper ancestor 
{u} is reachable from v. 
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To see that there exists (u, v) that satisfies Rule * suppose that L-edges (u, v-t), i = 1 , . . . , k fail 
this rule and Si is the set of nodes reachable from v-t in V— {u}. Then R = o({u}USi U . . . SjJ C o(u) 
and R must contain a suitable L-edge. 

Case 1.2: C ^ {t}. Initially we pay for C — ) u and LowEdge[u]. 

This means that t will be visited later. Because A does not share, it neither helps connecting 
T[u] (which is strongly connected), nor it helps connecting C with its ancestor. 

Thus it is OK when we delete T-edge C — ) u and we wait until t is visited in the future. Then 
DFS(t) introduces edge (t,u), paid by A using the money for the deleted edge, and the cost of 
LowEdge[u] is shared by A and {t}. 

Note that our actual algorithm differs from Dfs in two ways: ® DFS(t) inserts to B the primary 
edge exiting t without waiting for the results of its recursive calls, and © we insert this L-edge and 
delete a T edge. We will describe similar deviations in the subsequent cases. 
Case 2: |A| = 2, A = {u,v}. Dfs(u) starts with making the call Dfs(v). 

We consider what happens when we execute the final check of Dfs(v). If an edge to an ancestor 
of C is already introduced, A has to pay for T-edge C — > u, for edge (u,v) and it can "afford" to 
pay 1 € for that B-edge (more than 0.5 € for sharing the cost). 

If an edge to u is already introduced, A does not share its cost, while T[v] U {u} is already 
strongly connected. At the end of Dfs(u), A can afford to pay for introducing a B-edge. 

Now we assume that no edge to a proper ancestor of v was introduced before the final check 
of Dfs(v), and thus T[v] is already strongly connected. Thus Dfs(v) inserts LowEdge[v] to B. If 
this edge goes to an ancestor of C, again, A pays for three edges only. 

If LowEdge[v] goes to u, then o(T[v]) C i(u), hence R = o(T[v] U{u}) C o(u). Then R contains 
an L-edge hat exits u and does not go to v, and this means that A is rich. 

Case 3: |A| = 3, A = {u,v,w}. While the previous two cases are much simpler then in [15], the 
case of |A| = 3 is roughly similar, and we give the details in Appendix A. 

3.2 Extending the algorithm for MlN-ED to Min-TRt 

When the set of required edges is not empty, D ^ 0, the approach in the previous section has 
to be somewhat modified. When we form "lower bound" edge set L we clearly have D C L, but 
the algorithm in some cases fails to include L-edges in the solution. It never happens with L-edges 
in paths and rich objects, so it suffices to consider poor digons and triangles, and make necessary 
modification to our algorithm. 

If an L-edge is not "noticed" by the algorithm, then it was not considered in the lower bound 
used to justify the edges of the solution, so when we insert this edge, we can also "notice" its 
contribution to the lower bound. 

3.2.1 Digons with D-edges 

A problematic digon consists of a non-D-edge (u,v) and a D-edge (v,u). If a problematic digon 
can can be adopted as a digon of L and subsequently it can cause the algorithm for MlN-ED to 
"malfunction" i. e. to remove its D-edge, we say that it is worrisome. We need to prevent worrisome 
digons from being considered as objects by appropriately modified algorithm for MlN-ED. 

We proceed in two stages. First, we show how to handle the case of problematic digons that are 
exited or entered with L-edges (including D-edges). The remaining problematic digons are disjoint. 

Because they are disjoint, each of them has to be separately entered and exited, plus we need to 
enter the beginning of the D-edge of the digon and exit the end of this edge. Thus among different 
ways to enter the digon (or exit) we value more those that satisfy two requirement rather than one. 
This gives the rise to a maximum weight matching problem. The details are in Appendix B. 
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3.2.2 Triangles 



Our algorithm for MiN-ED can be applied to triangles with small modifications. It is still the case 
that when a triangle is free we can connect its nodes with the rest of the solution using 4 edges, 
but the argument has to be a bit different in the presence of required edges. 

Thus suppose that we obtained a solution for the complement of a triangle A = (u, v,w] in 
which edge (w, u) is required. If we cannot enter A through node v, v has to be entered from inside 
A, hence every solution must have two edges inside A, hence we can collapse A in the preprocessing. 
The case when there is no exit of A from node u is symmetric. Thus we can enter A through v, 
traverse (v,w,u) and exit through u. We say that A is free, and a free triangle has a surplus of 
0.5€. 

One can see that a free triangle without required edges and which is not collapsed in prepro- 
cessing also saves an edge and arrives at a surplus. 

We can summarize this section with the following theorem: 

Theorem 1 There is a polynomial time algorithms that given an input graph (V, E) and a set of 
required edges D C E produces a transitive reduction H such that DCHCE and |H| < 1 .5k — 1 , 
where k is the size of an optimum solution. 

The reason for —1 in the statement is that no edges are added for the root object in L, and this 
object has at least 2 edges. 

4 2-approximation for Max-TR] 

Theorem 2 There is a polynomial time algorithms that given an input graph (V, E) and a set of 
required edges D C E produces a transitive reduction H such that DCHCE and E — H > 0.5k+1, 
where k is the size of |E — H for an optimum solution. 

(In the proof, we add in parenthesis the parts needed to prove 0.5k+ 1 bound rather than 0.5k.) 
First, we determine the necessary edges: e is necessary if e € D or {e} = i(S) for some node set S. 
(If there are any cycles of necessary edges, we replace them with single nodes.) 

We give a cost of to the necessary edges and a cost of 1 for the remaining ones. Remember 
the primal/dual formulations (in particular (P2) and (D2)) of Section [2j We set x e = 1 if e is a 
necessary edge and x e = 0.5 otherwise. This is a valid solution for the fractional relaxation of the 
problem as defined in (PI). 

Now, pick any node r. (Make sure that no necessary edges enter e.) The out-arborescence 
problem, as defined in Section [2j is to find a set of edges of minimum cost that provides a path 
from r to every other node; edges of cost can be used in every solution. An optimum (integral) out- 
arborescence T can be computed in polynomial time by the greedy heuristic in [8], this algorithm 
also provides a set of cuts that forms a dual solution. 

Suppose that m edges of cost 1 are not included in T, then no solution can delete more than m 
edges (indeed, more than m — 1 , to the cuts collected by the greedy algorithm we can add i(r)). Let 
us reduce the cost of edges in T to 0. Our fractional solution is still valid for the in-arborescence, 
so we can find the in-arborescence with at most m/2 edges that still have cost 1. Thus we delete 
at least m/2 edges, while the upper bound is m (m — 1). 

(To assure deletion of at least 1/2 + 1 edges, where I is the optimum number, we can try in 
every possible way one initial deletion. If there optimum number of deletions is k, we are left with 
approximating among k — 1 , we get an upper bound of at least k — 1 with k edges left for possible 
deletions, so we delete at least k/2, plus the initial 1.) 
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5 Approximating MlN-TR p and Max-TR p for prime p 



We will show how to transform our approximation algorithms for MlN-TRi and MlN-TRi into 
approximation algorithms for MiN-TR p and Max-TR p with ratios 1.5 and 2 respectively. For 
simplicy. we discuss the case of Min-TR p , but every statement applies to Max-TR p as well. 

In a nutshell, we can reduce the approximation in the general case the case of a strongly 
connected graph, and in a strongly connected graph we will show that a solution to Min-TRi 
can be transformed into a solution to Min-TR p by adding a single edge, and in polynomial time 
(proportional to p) we can find that edge. 

In turn, when we run an approximation algorithms within strongly connected components, 
we obtain its approximation ratio even if we add one extra edge (it is actually a property of our 
algorithms, but in any case one can try to guess correctly several solution edges and save an additive 
constant from the approximation). 

Consider an instance (V, E,£,D) of Max-TR p . The following proposition says that it suffices 
to restrict our attention to strongly connected components of (V, E]H. 

Proposition 3 [2] Let p > 1 be a constant. If we are given p- approximation of Max-TR p for 
each strongly connected component of [V, E), we can compute in polynomial time a p- approximation 
for (V,E). 

The following characterization of strongly connected graphs appears in [2]. 

Lemma 4 [2] Let (C, E[C], t, D) is an instance o/Max-TR p . Every strongly connected component 
C of (V, E) is one of the following two types: 

(Multiple Parity Component) |{a 6 Z p : (u,v, a) £ Closure£(E)}| = p for any u,v G C; 

(Single Parity Components) |{a E Z p : (u,v, a) € Closurec(E)}| = 1 for any u,v € C. 

Moreover, C is a multiple parity component if and only if it contains a simple cycle of non-zero 
parity. 

Based on the above lemma, we can use the following approach. Consider an instance (V, E, I, D) 
of Min-TR p . For every strongly connected component C C V we consider an induced instance of 
Min-TRi , (C, E(C), DnC). We find an approximate solution Ac that contains an out-arborescence 
Tc with root r. We label each node u £ C with £(u) = £(P-u) where V u is the unique path in Tc 
from r to u. 

Now for every (u,v) € E(C) we check if l(v) = £(u) + £(u,v) mod p. 

If this is true for every e G E(C) then C is a single parity component. Otherwise, we pick a 
single edge (u,v) violating the test and we insert it to Ac- This is sufficient because Au contains 
a path Q from v to r, and the cycles (P u , (u,v), Q) and (P v , Q) have different parities, hence one 
of them is non-zero. 

6 Integrality Gap of the LP Formulation for MlN-ED and MAX-ED 

Lemma 5 The primal LP formulation for Min-ED and Max-ED has an integrality gap of at least 
4/3 and 3/2, respectively. 

3 The authors in [2] prove their result only for Min-TR p , but the proof applies to Max-TR p as well. 



9 



We use the same construction for MiN-ED and Max-ED. Our graph will consist of 2u nodes. 
We first define 2n + 2 nodes of the form (i,j) where < i < 2 and < i < u. Later we will collapse 
together nodes (0,0) and (1,0) into node 0, as well as nodes (0,n) and (1,n) into node n. We have 
two types of edges: ((i, j), (i, j + 1 )) and ((i, j), (i ± 1 , j - 1 )). 

We get a fractional solution by giving coefficient 0.5 to every edge. We need to show that 
U ^ V and U ^ implies |i(U)| > 2. Suppose {0, (0, 1 ),..., (0,n - l),n} C U; let j be the 
least number such that (1,j) U; then i(U) contains ((1 , ) — 1 ), (1 , ))) and ((0, j + 1 ), (1 , ))). A 
symmetric argument holds if {0, (0, 1 ), . . . , (0,n — 1 ), n} C U. In the remaining case, edge disjoint 
paths (0, (i, 1 ), . . . , (i, n — 1 ),n), i = 0, 1 , contain edges from l(U). 

The cost of this fractional solution is 2n (two edges from every of 2n nodes, times 0.5). We 
will show that the minimum integer solution costs [~(8n — 4)/3~|. For this, it suffices to show that 
no simple cycle in this graph has the length exceeding 4 nodes. 

If no two consecutive edges in a cycle increase the value of the second coordinate, no pair of edges 
increases the this value, so we have at most two such values, hence at most 4 nodes. Alternatively, 
if a cycle has two such edges, say the path ( (0, i — 1 ) , (0, i) , (0, i + 1 ) ) , it has to return without using 
edges that are incident to (0, i) so it has to use ((0, i + 1 ), (1 , i)) and ((1 ,i), (0, i — 1 )), so it is a 
cycle of length 4, ((0, i - 1 ), (0, i), (0,1+ 1), (1,i)). 

Every edge of a minimum solution belongs to a simple cycle contained in that solution; we can 
start with set {0} and extend it using an edge going from the current set, and a simple cycle that 
contains that edge; if we add k edges to the solution we add k — 1 nodes, and k < 4; thus the 
average cost of adding a node must be at least 4/3 edges. This completes the proof for Min-ED. 

When we analyze this example for Max-ED, fractional relaxation allows 2n deletions while 
actually we can perform only |n of them, so the ratio is 2| = j. 

7 MAX-SNP-hardness Results 

Theorem 6 Let k-MiN-ED and k-MAX-ED be the MiN-ED and Max-ED problems, respectively, 
restricted to graphs in which the longest cycle has k edges. Then, 5-Min-ED and 5-Max-ED are 
both MAX-SNP-hard. 

Khuller et al. used a reduction from 3-SAT to Min-ED. We are basically using their reduction, 
but we restrict its application to sets of clauses in which each literal occurs exactly twice (and each 
variable, four times). It was shown in [1] that this restriction yields a MAX-SNP hard problem. 
Details are in Appendix C. 
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Appendix A: Algorithm for MlN-ED, triangle cases 



Case 3: |A| = 3, A = {u,v, w} 

We use the following preprocessing. If a triple of nodes can be connected with a cycle (triangle) 
A, and it must contain at least two solution edges, we contract it to a single node (decreasing the 
optimum cost by at least 2), find a solution and then we insert triangle a to the solution (increasing 
the cost by 3). Clearly, this preserves approximation ratio 1.5. 

We assume that (u,v,w) is an oriented cycle, so Dfs(u) starts with a call to Dfs(v), which 
starts with a call to Dfs(w). 

Case 3.1: A contains an endpoint of a primary L-edge. We can repeat the reasoning of Case 1, 
assume that this is edge from a path node t etc. 

Case 3.2: Assume that the solution remains strongly connected when we remove A. We say that 
A is free. We will show that A can have a surplus of 0.5 € by traversing A as follows: from V — 1 , 
two edges inside A, back to V — A, so the balance is 4.5 — 4 = 0.5 € . 

Suppose that all edges (V — A) — > A enter through the same node, then an L-edge has to enter 
A, and either A is rich or we have Case 3.1. Similarly, if all edges A — > (V — A) exit from the same 
node, A is rich. 

If we can exit from node w, use the path (V — A) — > u — > v — > w — > (V — A). Otherwise we can 
assume exits from both u and v. 

If we can enter A at node v, use the path (V — A) — > v — > w — > u — > (V — A) and if we can 
enter A at node w, use (V — A) — > w — > u — > v — > (V — A). 

Case 3.3: In the remaining case, no L-edge enters or exits A, edges enter A at two nodes, and exit 
at two nodes, A does not share and A is not free. 

One conclusion is that Dfs(u) must visit other objects besides A. Therefore T[u] has branches 
that extend beyond A. We will consider the structure of those branches. A branch is completed 
when a B-edge to a proper ancestor of its first object is inserted; this merges some scc's, say 
Di , Dk into the ancestral sec, so it uses k + 1 edges outside Dt's. If such a branch has a surplus 
we transfer 0.5 € to A which completes its accounting. Our goal is to show that such a surplus 
exists, or we can make A share, or we can make A free. 

Case 3.3.1: k > 2. Each Dt has a "local surplus" of at least 0.5 € after paying for the incoming 
edge, and this includes the case of nodes of a digon; a digon has 3 € and two nodes. Thus we can 
pay for the last edge and the branch still has positive surplus. 

Case 3.3.2: k = 2 and either Di or D2 is rich. The accounting is the same as in Case 3.3.1. 
Case 3.3.3: Di is a path node, say t. Then there exists an L-edge (s,t) and because A is not rich, 
s ^ A. Delete A — > t and backtrack from t using L-edges until you leave T[u] or you encounter a 
cycle object, say F. 

If you leave T[u] while backtracking, we can replace C — > u with the edge that "left T[u] 
backwards". Because we removed edge A — > t, this improves the balance by 1€. 

If you reach a cycle object inside T[u] via a primary edge, we obtain a branch that goes through 
rich F and t, so we have Case 3.3.1 or 3.3.2. If we reach F via a secondary edge F — > s, s is "super 
rich" with 3 € and it can transfer 0.5 € to A. 

Case 3.3.4: D2 is a path node, say t, while Di is not. Because Di is not rich and not a digon, it 
is a triangle. We repeat the reasoning of the previous case, while Di becomes a free triangle. 
Case 3.3.5: Remaining case: each branch either has a single sec Di, which is rich (otherwise it is 
a free triangle), or two triangle scc's, or two single-node scc's that together form a digon. 
Case 3.3.5.1: Open case: there exists an edge between V — T[u] and T[u] — A. We will analyze 
the case of an edge V — T[u] — > T[u] — A, the other case is symmetric. 

It this edge enters some Di, then we insert it and delete two edges, C — > A and A — > Di. 
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If this edge enters some D2 where Di,D2 is a pair of triangles, we insert it, remove C — > A, 
A — > Di and Di — > D2 and thus we make Di a free triangle, so restoring the connections of 
A — > Di — > D2 will save an edge. 

If this edge enters some D2 = {tz} where Di = {ti} and {ti,t2} is a digon, we insert this edge, 
remove A — > t) and tz — > A and insert an edge (ti , s) for some s 7^ tz- It is not possible that exits 
from ti are restricted to t2, as we would have o({ti,t2}) C o{tz), necessitating an L-edge exiting 
the digon and the digon would be rich. 

If s 6 T[u], we can delete C — > A and save. Otherwise the progress is in making the set T[u] — A 
smaller. 

Case 3.3.5.2: Closed case. The edges between V — T[u] and T[u] must include nodes in A. We 
can free A: there must be at least two nodes in A that are entered by such edges, otherwise the 
edge C — > A is an L-edge, Case 3.1. Similarly, there must be two nodes in A from which such edges 
can exit. So we can repeat the reasoning of Case 3.2. 

Remark 1. When we discuss subcases of Case 3.3, we assumed that the scc's that are coalesced 
when a branch is completed are of one of the "basic types". Actually, they can have a nested 
structure, following the recursive nature of Dfs. If this is the case, we can identify an sec with 
its root object. If the root is rich, then D| inherits the initial balance if the root, so it is rich. If 
the root is a triangle, D-t inherits its balance as well, additionally, making Di free has the same 
effect as having free T[u], a subtree rooted by A that is discussed in those cases. Because we want 
to gain by making a subtree rooted by a triangle free, we reduce the problem to that of a smaller 
subtree with the same property. 

The cases of path nodes, 3.3.3 and 3.3.4 are not altered if these path nodes are roots of larger 
scc's, and neither is Case 3.3.5.2. 

Finally, the open case with digon (3.3.5.1) has to be elaborated when we have an edge from 
"outside" to the subtree rooted at D2 = {tz}- Our argument was that we can proceed to ti, and if 
we cannot exit from ti to a node different than tz then the digon {ti , tz\ is rich as o(ti , tz) C o{tz). 
This argument does not work if we try to apply it to T[ti] and T[tz] rather than to ti and t2- 

However, if as o(T[ti] U {tz}) C o(t2) the argument is still valid, so it remains to address the 
case when we have an edge from T[ti] to T[tz] — {tz}- Then rather then using the edge from outside 
of T[u] we change the depth first search as follows: when we reach ti, we give the edge (ti,t2) the 
last priority, and the resulting T[ti] will contain some part of T[tz] and t2 itself. Thus we get a 
path from A to A that includes at least 3 objects: ti, t2 and an object that belonged to Tfe], and 
this path delivers a surplus to A. 

Appendix B: Algorithm for MlN-TRi, digon cases 

A problematic digon consists of a non-D-edge (u,v) and a D-edge (v,u). If a problematic digon 
can can be adopted as a digon of L and subsequently it can cause the algorithm for Min-ED to 
"malfunction" i. e. to remove its D-edge, we say that it is worrisome. We need to prevent worrisome 
digons from being considered as objects by appropriately modified algorithm for Min-ED. 
Case A: Suppose that a D-edge e exits or enters a problematic digon A = {u, v}. 
Case Al: e exits A. An L-edge that exits a digon makes it rich, more precisely, it provides this 
digon with 1 € . 

Case A2: e enters A and it is a secondary edge, or it originates in a triangle. 

We can change the rule allocating the money of a secondary edge so that the problematic target 
gets 1 € (the current rule only allocates 0.5 € for such an edge to its ending, if the latter is a path 
node). Similarly, for a primary edge that exits a triangle it suffices to give 0.5 € to the triangle, 
leaving 1 € for the problematic target. 
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Case A3: e enters A from Thus it suffices to consider cases when problematic digon {u,v} is 
entered by a primary e, from a digon or a path node. 

Case A3: e enters A from a path node, {t}. We can show that for an object B that is not a path 
node, if there exists a primary edge u — ) B, where u is a path node, we can reduce the expenses of 
B to 1«. 

Case A3.1: Suppose that Dfs finds B before u. 

Case A3. 1.1: Before u is visited, Dfs produces a back edge b for B. 

Case A3. 1.1.1: The cost of b is shared by B with another object. Then the expenses of B are 1 .5€ 5 
so it needs to obtain 0.5 € from {u}. Later, when Dfs finds {u}, suppose it is done by traversing an 
edge not from the "primary predecessor" of u, hence from some cyclic object B'; then {u} and B' 
share the cost of connecting to an ancestor — at the time of Dfs(u), sccjub(B) is this ancestor. 
Inductively, "compound object" B' U {u} U B will receive 0.5 € later. 

On the other hand, if Dfs finds {u} by traversing an edge from its "primary predecessor" that 
is also a path node, say s, we apply the same argument (s and t share the cost and will receive 0.5 € 
later). Finally, if {t} is entered via a secondary edge or from a cycle, we collect the promised 0.5 € . 
Case A3. 1.1. 2: The cost of b is not shared by B with another object. We have the same reasoning 
as in Case A3. 1.1.1, but the initial expenses of B are 2€, while later we can delete the initial edge 
used to enter B, so the initial cost increases by 0.5 € and subsequent savings by 1 € . 
Case A3. 2: B is visited after {u}, so Dfs(B) is the first step of Dfs(u). A back edge from a 
descendant of B to {u} may create B', and the same applies to a back edge from B to the predecessor 
of u, while a back edge from B to u implies that there is no other way of exiting T[B], and, like in 
Case 1.1, we can conclude that u — ) B is not a primary edge. 
Case A4: e enters A from a digon C. 

Case A4.1: Dfs finds A before C. If A shares the cost of a back edge with another object, then 
its expenses are 1.5 € , and when C is visited, it uses the connecting L-edge to its ancestor that 
contains A, so it pays only for entering; together C and A have 5 edges and have expenses of 2.5 € 
(not counting the edges that they have and which they do not delete). If A makes a back edge 
without sharing the cost, then when C is visited we delete the edge used to enter A. It remains to 
consider the case when C is visited during Dfs(()A), and with the "obligatory" edge C — ) A this 
is closing a circuit that consists of A, B-j, . . . , B^, C. This circuit has k + 2 objects, is closed with 
k+ 1 edges (not including the "obligatory" edge C — > A), and it needs to be entered and exited. 
The intermediate objects contribute k — 1 to the needs and 1 .5k € to the amount of money to pay 
for these needs, while A and C contribute 2.5 € , so if k > we have a balance as 1 .5k + 2.5 > k + 4. 
If k = 0, this means that we enter C with an edge from A. Note that it is still possible that A will 
share the cost of a back edge and we get a balance. If not, V — A — C is strongly connected in TUB, 
so we can try to improve the solution by replacing edges — ) A — ) C with — > C. If not possible, C 
can be accessed only from A. 

Now we have to ask: why is edge C — > fu,v} in L? If this is because of a requirement that 
can be satisfied by an edge that leaves C U A, then we will make such a replacement, and we now 
use a "macro-path" — > A — > C — >, with 2 new edges and 1 edge replaced. If this is because of a 
requirement that must be satisfied by an C — ) A edge, we have no "nice" requirement that requires 
A — > C edge; consequently we can choose an edge A — > C in such a way that we could delete an 
edge inside C. This will not work only if C contains a D edge. Summarizing, AU C must contain D 
edges inside A and inside C, and also edges A — > C and C — ) A, while we can connect them using 
6 edges. In such a case we collapse A U C to a single node, which removes at least 4 edges from an 
optimum solution, and then we can modify the resulting solution by adding 6 edges that connect 
AUC. 

Case A4.2: Dfs finds C before A. Because C is a rich digon, we treat it as a single node and 
Dfs(C) starts by visiting A. 
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If A shares the cost of a back edge with a descendant, A and C together spend 2.5 € on new 
edges, and if A shares this cost with C, they spend 2 € . If A does not share the cost but C is a 
non-problematic digon, the combination of edges C — > A — > C can remove an edge in C. Lastly, if 
C is a problematic digon and A cannot share, we again have the case that A U C must contain at 
least 4 edges of an optimum solution and we can collapse A U C and connect it using 6 edges. 

After this case analysis, we can conclude that using collapsing of pairs and quadruples we can 
eliminate the possibility of worrisome digons that are entered by D-edges or which emanate D- 
edges. The remaining worrisome digons are disjoint. Moreover, if {u,v} is a problematic digon and 
we have more than one requirement contained in o(u), the digon is not worrisome, and the same 
is true for i(u), o(v) and l(v). 

Now we are altering the linear program P3 - which keeps of those requirements Rx > 1 of P2 
that for some node u satisfy R C o(u) or R C l(u), let us call them The dual D3 is to find a 
maximum size collection of requirements of P3 such that no two can be satisfied by a single edge 
(i such requirements ===we need i edges). 

We will form a new dual program, D4. For a node w that does not belong to a worrisome digon, 
as before we introduce requirements contained in o(w) and in i(w). For a worrisome digon with 
D-edge (v,u), we have the one-edge requirement {(v,u)}, and thus no other requirements contained 
in o(v) and i(u). Also, rather than having requirements o(u) and i(v) we form requirement as 
follows: 

• if there is more then one requirement contained in o(u,v) = o({u,v}) we will have these 
requirements; 

• if there is only one minimal requirement R contained in o(u,v), we will form a sibling pair of 
requirement: R, for simplicity referred to as as o(u,v), as well as o(u), both with coefficient 
0.5. 

• if there is more then one requirement contained in i(u,v)) we will have these requirements; 

• if there is only one minimal requirement R contained in i(u,v), we will have a sibling pair of 
requirements with coefficients 0.5, R, referred to it as o(u,v), as well as i(v). 

In P4 an edge can satisfy more than two requirements, but the sum of coefficients satisfied by 
an edge is at most two. 

Assume that the sum of coefficients of requirements in P4 is N and consider a set of edges L* 
that satisfies all of them. For every requirement we give credit to one of the L*-edges that satisfy 
them. Say that a net credit of an edge is the sum of credits it received minus one; then the size of 
L* is N minus the sum of net credits of edges of L*. 

Now, consider a set of edges M of L* with positive net credits; the lower bound for the size of 
L* is N minus the sum of net credits in M. We can assign credits in such a way that M can be 
viewed as a matching. 

Consider a sibling pair of requirements, o(u) and o(u,v) (the same observation will hold for i(v) 
and i(u,v)). The only way to satisfy the former without satisfying the latter is with edge (u,v); 
this edge satisfies only two requirements, both with coefficient 0.5, so it cannot belong to M. Thus 
if an edge in M gets credit for satisfying o(u) we can also give it the credit for satisfying o(u,v); 
in this manner only one edge can get credit from a sibling pair of requirements. 

This defines a bipartite graph: "nodes" are requirements with coefficient 1 and sibling pairs, 
edges are pairs of "nodes" in which requirements can be satisfied simultaneously, edge weights are 
net credits. We can find M as a matching in this graph with the maximum weight. 

Once we find M, we can complete the lower bound by taking account of the credits not dis- 
tributed to M: 



15 



(a) If we have a requirement with coefficient 1 that did not give credit to M, we can add an 
edge that satisfies it, the same can be done for a sibling pair of requirements that did not give any 
credits to M (note that we can satisfy a sibling pair with a single edge). 

(b) In the case of a sibling pair that gave 0.5 credit to M, its digon retains 0.5 credit, and we 
use it to collapse this digon to a single node: collapsed digon uses two edges, and it has 1 .5 € for 
its D-edge and 0.5 € for its remaining credit. 

After applying (a-b), all credits are used up and all requirements of P4 are satisfied, either 
within the computed lower bound, or using the "extra credit" from collapsed D-edges. Moreover, 
for each new node x uv (a collapsed worrisome digon) we have edges that satisfy all the minimal 
requirements contained in o(x)uv and in l(x uv ). 

However, we may have the following pathology: in the resulting set of edges L we may have 
nodes with no edge exiting (or no edge entering). For no edge entering, it may happen like that: 
for a worrisome digon {u,v} with D-edge (v,u) we have multiple requirements contained in i(u, v), 
and we satisfied all of them with edges that enter u. We will try to correct this as follows: if one of 
these edges exits a cycle, we replace it with an edge that also satisfies this requirement, but which 
enters v. Thus the pathology remains if none of at least two L-edges entering {u, v} exits a cycle. 
One of these edges can be on a cycle that contains u, but at least one enters {u, v} from a path 
node (or a worrisome digon that can be considered like a path node). We can collapse a worrisome 
digon that exhibits such a pathology, and because it becomes either a path node with two edges 
entering from other path nodes, or a part of a cycle that is entered with an edge from a path node, 
we can collect 0.5 € for the use within {u,v}. 

Of course, a similar pathology and its resolution may happen when we have multiple L-edges 
exiting {u, v}. 

Appendix C: MAX-SNP-hardness Results 

Theorem 7 Let k-MiN-ED and k-MAX-ED be the MlN-ED and Max-ED problems, respectively, 
restricted to graphs in which the longest cycle has k edges. Then, 5-Min-ED and 5-Max-ED are 
both MAX-SNP-hard. 

We will use a single approximation reduction that reduces 2Reg-Max-SAT to 5-Min-ED and 
5-Max-ED. 

In Max-SAT problem the input is a set S of disjunctions of literals, a valid solution is an 
assignment of truth values (a mapping from variables to {0, 1}), and the objective function is the 
number of clauses in S that are satisfied. 2Reg-Max-SAT is Max-SAT restricted to sets of clauses 
in which every variable x occurs exactly four times (of course, if it occurs at all), twice as literal x, 
twice as literal x. This problem is MAX-SNP hard even if we impose another constraint, namely 
that each clause has exactly three literals [3]. 

Consider an instance S of 2Reg-Max-SAT with n variables and m clauses. We construct a 
graph with 1 + 6n + m nodes and 14u + m edges. One node is H, the hub. For each clause c we 
have node c. For each variable x we have a gadget G x with 6 nodes, two switch nodes labeled x?, 
two nodes that are occurrences of literal x and two nodes that are occurrences of literal x. 

We have the following edges: (h,x?) for every switch node, (c,fi) for every clause node, (I, c) 
for every occurrence I of a literal in clause c, while each node gadget is connected with 8 edges as 
shown in Fig. [3j 

We will show that 

® if we can satisfy k clauses, then we have a solution of Min-ED with 8n + 2m — k nodes, which 
is also a solution of Max-ED that deletes 6u — m + k edges; 
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© if we have a solution of Min-ED with 8u + 2m — k edges, we can show a solution of 2Reg- 
Max-SAT that satisfies k clauses. 

To show ®, we take a truth assignment and form an edge set as follows: include all edges from 
h, to switch nodes (2n edges) and from clauses to h. (m edges). For a variable x assigned as true 
pick set A x of 6 edges forming two paths of the form (x?,x, x, c), where c is the clause where literal 
x occurs, and if x is assigned false, we pick set A x of edges from the paths of the form (x?,x,x, c) 
(6n edges). At this point, the only nodes that are not on cycles including h are nodes of unsatisfied 
clauses, so for each unsatisfied clause c we pick one of its literal occurrences, I and add edge (I, c) 
(m — k edges). 

To show ©, we take a solution D of Min-ED. D must contains all 2n + m edges of the form 
(h,x?) and (c,h). Let D x be the subset of D consisting of edges that are incident to the literals of 
variable x and let C be the set of clause nodes. 

Simple inspection of cases show that if |D X | = 6 then D x = A x or D x = A x . 

If |D X | > 8 we replace D x with A x and two edges x — > C. 

If D x contains i edges to C, then |D X | > 4 + i, because besides these edges D x contains 4 edges 
to the literals of x. If i = 4 we are in the case already discussed. If i = 3, suppose that a clause 
where x occurs has no incoming edge in D x ; we can replace D x with A x plus one edge to a clause 
in which x occurs. If a clause where x occurs has no incoming edge in D x , we perform a symmetric 
replacement of D x . 

In the remaining case i x < 2 and |D X | = 7, and we can perform a replacement as in the case of 
i = 3. 

After all these replacements, the size of A did not increase while each D x has the form of A x or 
A x plus some edges to C. If A x C D x we assign x to true, otherwise to false. Clearly, if the union 
of D x 's has 6n + m — k edges, at most m — k clauses are not satisfied by this truth assignment 
(those entered by "some other edges to C"), so if |A| = 8n + 2m — k, at least k clauses are satisfied. 

Berman et al. [1] have a randomized construction of 2Reg-Max-SAT instances with 90u 
variables and 176n clauses for which it is NP-hard to tell if we can leave at most en clauses 
unsatisfied or at least (1 — e)u. The above construction converts it to graphs with (14 x 90 + 176) 
edges in which it is NP-hard to tell if we need at least (8 x 90 + 1 76 + 1 — e)n edges or at most 
(8 x 90 + 1 76 + e)u, which gives a bound on approximability of Min-ED of 1 + 1 /896, and 1 + 1 /539 
for Max-ED. 

Figure 3: Illustration of our reduction. Marked edges are 
necessary. Dash-marked edges show set A x that we can in- 
terpret it as x =true. If some i clause nodes are not reached 
(i.e., the corresponding clause is not satisfied) then we need 
to add k extra edges. Thus, k unsatisfied clauses correspond 
to 8n + m + k edges being used (6n — k deleted) and k sat- 
isfied clauses correspond to 8n + 2m — k edges being used 
(6n + m — k deleted). 




□ 
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